%This document describes the contribution of the PhD.
%It serves the purpose of setting the core of the PhD thesis and 
%forming the Discussion chapter. Early chapters such as
%literature reviews and introductions can be extracted from
%this document.
%
\documentclass{article}
\usepackage{fullpage}
\linespread{1.3}


\begin{document}


\title{Contribution of the PhD thesis}
\author{Sheng,Yu}
\maketitle


\section{Context of the PhD thesis}
Emerging EHR standards are bringing changes to existing e-health infrastructure to enable communication between different systems. This is a result of attempting to address          in traditional EPR-based systems. The sharing and exchanging of clinical information is increasingly  supported by EHR -based communication. Two-level model EHR standards such as en13606 and openEHR use an approach that is based on a so called  Archetype Object Model to allow clinical experts to design technical artifacts that are used in the system development cycle and later information communication: archetypes, which are blue prints of the information to be shared and exchanged.


To build an e-health eco-system to promote semantic interoperability, EHR systems incorporate terminology systems. Terminological resources can be provided and consumed in many ways to ensure meaningful information is being exchanged and reduce ambiguity. Terminology resources can be embedded as external references in archetypes. Although clinical terminologies are not new to e-health, in the author’s view their use is not being exploited enough in the newly-emerging two-level model based EHR system. Despite the fact that large and sophisticated clinical terminology systems like SNOMED-CT are supported in EHR systems, many features of SNOMED-CT are not used.


On the other side, problems related to agreement on clinical concept modelling in archetypes rather than purely technical problems are on the rise. A community-based approach allows freely designed archetypes to propagate rapidly but also leads to archetype management problems. The archetype design process involves building concepts around the information that is exchanged between different systems. No specific guideline has been developed explicitly on how archetypes are imbued with clinical concepts from standard terminologies. Also when using a sophisticated terminology like SNOMED-CT (SNOMED-CT itself can be used as a language to model and form a clinical statement), overlaps and SNOMED-specific features make the term-binding task obscure. Despite a dedicated language that is designed for creating archetypes (ADL), it is evident from the scarcity of term bindings in the existing corpus of publicly available archetypes, that terminology plays very little part in the development of community level archetypes. In addition,  term-binding is left to a few experts in the last step of the archetype creation process and often ignored. It is  difficult for terminology experts to step in, because the EHR model is not designed to interact with terminology systems.


A common terminology is essential to communicating parties who wish to achieve semantic interoperability. Therefore the integration of terminology with EHR standards becomes vital. In order to endorse the use of common terminology, EHR information models were designed to support embedding of terms and codes from terminology systems. The HL7 information standard for instance, defines a ``vocabulary” model [ref hl7 vocabulary domain] to accommodate terminology-related information for  HL7 objects and attributes. However, with the support of an EHR information model, great care should be taken both during the development of systems and when clinicians use the system to encode clinical information.     
Great effort has been made to develop guidelines for embedding terminology in the HL7 messaging standard.  By comparison, two-level model based EHR standards lack ability to work seamlessly with terminology. To bridge the gap, a thorough study on harmonising the two-level model EHR and terminology system is needed; this study focused on two elements: AOM and SNOMED-CT, two sophisticated technologies which both represent the state of art in EHR information models and terminology systems respectively.


In the author's opinion, the binding mechanism of terminology resources in archetypes is not well used. The cause of this could be that SNOMED-CT is difficult to learn and use; archetype designers may not be aware of the clinical concepts that they are composing (due in part to a lack of guidelines); tools do not support automatic suggestions and provide guidance to archetype designers. This study started from investigating the mechanism for linking terms in archetypes with terms in SNOMED-CT, and showed how 'Shadows' can enhance the archetype modelling approach.






\section{Overview of contribution}


To summarise: this thesis complements the two-level model EHR approach by incorporating a well-established clinical terminology system SNOMED-CT.
The author studied the correspondences between SNOMED-CT and archetypes, and investigated the applicability of integrating the two. This thesis has introduced and described the creation of a process to generate a semantic artifact that the author has called a ``terminological Shadow". 
The author has verified the approach by carrying out a set of evaluations and then demonstrated the usefulness of ``terminological Shadows" in three specific scenarios; 
\begin{enumerate}
\item Helping term-binding for clinical archetypes, 
\item Reviewing clinical concept coverage, 
\item Comparing clinical archetypes.
\end{enumerate}
The thesis has demonstrated that ``terminological Shadows” benefit the existing technologies that manage and maintain archetypes. 
%With ``terminological Shadows”, these targets must be achieved in a manual manner or use %existing techniques in a more generic area which may not suit the task such as information %retrieval and text mining.[a]




\subsection{Initiatives}
In author's opinion, the manual task of SNOMED-CT code annotation and insertion in archetypes is a gradual and continuous one. The approach presented here, particularly the automatic SNOMED-CT binding mechanism, is not aiming to replace the effort of terminology experts. Nor will it replace the manual classification and categorisation of archetypes by experts. Rather, assuming that the speed of manual annotation and categorisation is slower than the growth of archetypes since the review process is usually behind the design process [maybe ref Alberto's thesis?], automatic categorisation can offer a better vision of an overview of archetypes. This has several benefits as mentioned in [ref to paper 2 stuff].


As indicated in the literature, a number of research projects and industrial organisations have shown their interest in adopting archetypes [b][ref to growing archetype repositories in different nations]. As a community-based approach, archetype designers are creating their own version of what should be communicated between EHR systems. Management of sightly different designs for archetypes based on similar clinical concepts becomes problematic because of the slow process of archetype evaluation cycle, manual inspection is needed to judge whether the semantic meanings are close in two archetypes. Automatic comparison can help to reduce the work load by identifying and highlighting the similarities. Experts can use their expertise in a deeper level.


%(similar to archetype comparison, the most relevant archetypes are returned)
Similar to the need for searching and finding the desired documents, archetypes need to be properly indexed since a large or general archetype may contain a large number of clinical concepts. Otherwise one has to first locate the archetype close to the topic, then observe each node in it to find the concept he/she is looking for. A simple string matching approach may not satisfy this need since the medical concepts are not taken into account. Therefore this method aims to reduce the manual work associated with searching for clinical concepts in archetypes, while also providing the experts with what they are familiar with: concepts from SNOMED-CT.


\subsection{Difference to terminology service}


This work differs from the maturing common terminology service technology (such as CTS2) in that the intention of such a service is to provide a unified interface for accessing and utilising terminology resources. The implementation of a common terminology service will ease the development of client applications to access multiple terminology resources. The focus of this study, however, is to discover the inner links between the EHR information model and the terminology model. It will benefit and enable users to use terminology resources more appropriately; thus to promote semantic interoperability.


\subsection{What is new}


In contrast to projects like TermInfo which investigated how to use SNOMED-CT with HL7 messages, the contribution of this project is to reveal the relationship between these two modelling approaches and it creates an intermediate representation called a 'terminological shadow' which takes the advantages and strengths of both the Archetype Object Model and SNOMED-CT to aid the development of a semantic environment for EHRs.


What is new in this approach is that context information from both archetypes and from the structure of SNOMED-CT have been taken into account to be used as meta data. To provide an analogy, this is similar to what the Semantic Web does to normal documents for web resources. An ontology is used to formally represent the knowledge and relationships between documents. With a much narrower scope than the Semantic Web, the approach taken in this thesis uses SNOMED-CT as a lexical resource and ontology to help aid manual tasks (such as creating, maintaining archetypes etc). By using the intermediate format (it contains both the structural information from archetypes and the possible corresponding concepts from SNOMED-CT') or terminological shadows, one can discover the coverage of existing archetypes; comparing archetypes to find similarity; using shadows as a query tool to find relevant archetypes. Without the shadow approach, these tasks are not automated in the current environment. \textbf{In current existing platforms, typically these tasks are handled by:}


\begin{itemize}
    \item Manual classification and categorisation of archetypes (this task can be daunting with large number of archetypes);
    \item Manual observation of archetypes if comparison is needed (possibly has to go through a full cycle of archetype reviewing);
    \item Third party tools to search and query archetypes (string matching technique)
\end{itemize}


\textbf{With terminological shadows one can:}
\begin{itemize}
    \item Automatically categorise archetypes and track coverage with respect to SNOMED-CT;
    \item Compare archetypes (the semantic similarity is loosely based on the concept model in SNOMED-CT);
    \item execute queries on archetypes which take into account the context information from archetypes, and from the Reference Model information
\end{itemize}


\subsection{Description of the study}


The core of this thesis comprises the investigation of the two modelling approaches that aim to achieve semantic interoperability among heterogeneous systems that utilise Electronic Health Records: standards allow health experts to model clinical information exemplified by Archetype
Object Model and general clinical vocabularies such as SNOMED-CT. Since neither offers a complete solution to problems that occur during communication in different clinical scenarios, interest is shown in integration of the two to reduce ambiguity of clinical information.
The TermInfo project led by the HL7 organisation is one example. One important output of the project is a set of guidelines for the usage of HL7 classes and SNOMED-CT concepts.


After a set of quantitative studies which investigated the existing resource of modelling clinical information using archetypes and SNOMED-CT, it has been shown that[c] the two modelling approaches are not mutually exclusive, in fact many overlaps were identified between Archetypes and SNOMED-CT. The author has shown through a set of studies that despite their different structures, syntax and usage, archetypes and SNOMED-CT have a great deal of similarity and equivalence. By studying the existing manual work that is done by clinical experts who try to model and create re-usable artifacts for clinical communication, the information in these artifacts can be mined to find patterns and used as a metric to produce an intermediate format. This intermediate format representation (the term 'terminological shadow' is coined) can be used to semi-automate a few tasks to benefit the end users (both the original composers of such information and the users who consume this information) such as tracking the clinical coverage of a particular medical topic (area), identifying redundant modelling to improve modelling efficiency, and indexing and retrieving meta data artifacts
(archetypes).


\section{Details of contribution}


\subsection{Discovery of the semantic gap}


In the area of health informatics, the gaps between EHR information models and medical terminology systems have not been the focus of research [d]during the evolution of electronic health records. However as numerous studies that have pointed out and attempted to address, a number
of semantic interoperability issues are explicitly or implicitly related to differences between the EHR information model and clinical terminology. These differences, also referred to as semantic gaps, are often manifested in certain incompatibilities that arise when it comes to incorporating terminology with EHRs. One example issue is that despite human experts' speciality in clinical knowledge, one may find it difficult to associate appropriate concepts in a clinical terminology, to an existing EHR information model. This is because collisions of similar concepts exist in both systems and the usages differ in syntax and structure. One simple solution is to include certain concepts that are frequently used in EHR information models in a master terminology. In fact SNOMED-CT as a large terminology has already been adopting concepts in EHR models e.g EHR context.
While it is generally considered that the incompatibility is a result of independent development of both systems, the author believes [e]a generic study is required in this field.


Various projects have been carried out to attempt to address some of these semantic interoperability issues, e.g the TermInfo project producedintegration guidelines for integrating HL7 and SNOMED. However the outcome was HL7 specific. The author found that a generic study is needed to investigate the bond between EHR models and terminologies, especially in the emerging two-level EHR models such as EN13606 and openEHR, where the number of archetypes grows significantly in contrast to terminology bindings.


a) There are currently four publicly available archetype repositories. Archetypes are designed for general use in EHR information communication. Archetype modelling tends to cover as many clinical areas as possible.


b) Archetypes do not always contain terminological references. At time of writing in November 2011, existing terminological references in archetypes are not widely utilised.


c) With the increasing number of archetypes, there is general lack of detailed guidelines for archetype modellers to design coherently and meaningfully.
Issues manifest themselves when users wish to find archetypes, navigate archetypes, decide to make new archetypes, or to create the internal structure of archetypes and so on.


\subsection{Initial investigation}


In this thesis the author investigated the Archetype-Terminology binding mechanism specified by the AOM to link archetypes to SNOMED-CT. The investigation formed the foundation of the conceptual idea of a ``Terminological shadow''. The results of the investigation show that:


a) The main binding method involves the manual creation of the binding during or post archetype-creation process by a clinical expert. Statistics of existing manual bindings showed that the number of manual bindings is relatively low comparing to the number of archetypes being created. A possible cause of this may be the modelling approach developed by the openEHR organisation, which leaves the binding to be completed in a more specific stage in the scenario of developing an EHR system. It is said that binding can be added in the so called Templates to fit particular clinical usage[ref]. In author's view, Archetypes as a self-contained information artefact which consist of clinical meta-data [f], may have more benefits to bind to terminology than at a lower level such as Templates. SNOMED-CT bindings that are embedded in an archetype can provide more functionality to clinical users and EHR platform developers.


b) Not too many tools provide assistance to experts to make decisions of what concepts in SNOMED-CT should be used in a given archetype. A small number of tools have been developed to help clinical experts to bind archetypes to SNOMED-CT. However most of them are focusing on particular functionality such as navigation or searching SNOMED-CT. The author believes that because gaps and similarities exist between the two, search or navigation tools can provide assistance at the human-computer interaction level, but not on the semantic level.


\subsection{Algorithm investigation}


Binding and mapping medical text to clinical concepts from terminology systems often involves algorithmic methods that process the text and the terminology. Many implementations of such methods have their root in the area of Natural Language Processing and Information Retrieval.
In this thesis, the author has explored different algorithms that can be used to map archetypes to SNOMED-CT concepts.[g] The steps involved in this investigation are as follows:


a) The author adopted a tf-idf based search algorithm and evaluated its performance of providing candidate SNOMED-CT concepts that can be bound to archetypes.  A tf-idf based tool such as Lucene can be used as an automatic suggestion system that requires no human intervention.
The study used existing manual bindings as the standard to evaluate the accuracy of the algorithm. The result showed that it performed well for a selected set of archetypes.




b) To eliminate false positives in the study, a more sophisticated evaluation was carried out to test the performance of this binding method. The results showed that the algorithm is quite stable on performance. The conclusion was that this method can be adopted to be used on larger data set to evaluate the semantics of archetypes.


\subsection{Conceptualisation of Shadows}


Next, the Author extended the binding algorithm to create 'Shadows' of archetypes:


 a) Based on earlier studies, an intermediate format called a 'Shadow' was created, to reflect
the clinical concepts in archetypes. This format contains mainly SNOMED-CT concepts of archetypes and context information from the reference model.


b) It was also shown that shadows can be generated using different binding algorithms to allow more accurate algorithms to be developed independently. Shadows themselves are also useful resources to help understand the archetype-SNOMED relationship and are not only limited to SNOMED-CT and a two-level model based EHR.




c) The author complemented the mapping process with a different algorithm to prove the flexibility of Shadows. MetaMap was utilised instead of Lucene and a similar performance test was conducted to compare with the previous results.


\subsection{Shadow utilisation}
The author demonstrated that a shadow is a multi-purpose tool that can be used to enhance archetype modelling approach in several ways.


\subsubsection{Coverage discovery by shadows}
 The author utilised Shadows to discover clinical concept coverage of an archetype repository:


a) Shadows were created from a large number of archetypes,
to view the coverage of clinical concepts of this repository by using SNOMED-CT as a metric. There are a number of  benefits to this study. A full overview on a repository can guide archetype modellers to balance their creation of archetypes and clinical areas of their interest. The study also reflected the relationship between the Archetype Object model and the SNOMED-CT concept model.


b) Using the biggest archetype repository -- NHS archetypes were used to generate shadows. The resulting shadows have revealed that certain clinical areas have an adequate number of archetypes for usage in their clinical scenario but some have been identified as under-covered.








\subsubsection{Archetype comparison by shadows}
In this thesis shadows were used to compare archetypes:


a) Semantic similarity measurement in graph theory was brought into shadows to enable comparison between archetypes.
Shadows were generated on the fly and used to compare with other archetypes. A semantic similarity measurement was calculated to evaluate how ``close’’ the two archetypes are  semantically. Also a more detailed view can be generated within archetypes to show their relationship (which data nodes are closest and what is their context). This work can aid the management of archetypes to identify potentially redundant archetypes for merging to improve efficiency.


b) The results also showed that the lowest-common-ancestor method can be used to identify similar nodes among archetypes. The comparison of functionality can also be used in other ways such as concept visualisation and quality control.


\section*{}
One-sentence:
The core of the project is the investigation of the co-relation between two essential components of Electronic Health Records whose goals are to achieve semantic interoperability in a health environment composed of heterogeneous EHR systems. In a nutshell, this work utilised terminology resource in archetypes to bring semantic enhancement for the two-level EHR model.



%todo -- IMPORTANT!!! Contribution: the shadow approach can be used for machine learning purpose as a whole new method for creating intelligent term binding, as a statistic resource and training resource, combined with human annotation, to provide a NLP-like training service for a EHR model (e.g openEHR training data, HL7 training data etc)




\end{document}

